In the context of the increasing demands for food security, climate change, and resource sustainability, precision agriculture has become a key approach to drive agricultural productivity. In this research, KRISHIMITRA, an explainable machine learning model for crop recommendation is developed using Extreme Gradient Boosting (XGBoost). The system is trained on a massive data set of 57,000 farming records, whereby 53,127 good quality samples for 55 different crop varieties were chosen. The system uses environmental and soil factors such as temperature, relative humidity, soil pH, soil nutrients (N, P, K), crop duration and water requirement as features.The XGBoost model exhibits outstanding classification performance with a test accuracy of 99.22%, a cross-validation accuracy of 99.36% ± 0.08% and a low train-test difference (0.24%) which suggests its good generalisation ability. In order to improve interpretability, feature importance analysis is performed, with crop duration, water requirement and relative humidity being the most significant features. The proposed model is compared with baseline models, including Naïve Bayes, Decision Tree, Support Vector Machine, Random Forest and Gradient Boosting, to validate the results.The use of explainable AI provides transparency, enabling the system to be deployed in agricultural practice. The findings demonstrate the promise of leveraging big data, ensemble methods and interpretability methods to advance precision agriculture.
Introduction
This study presents KRISHIMITRA, an intelligent crop recommendation system designed to support precision agriculture using XGBoost and Explainable Artificial Intelligence (XAI). Agriculture is vital to India's economy, but traditional farming practices often struggle to address challenges such as climate change, soil degradation, and water scarcity. Precision agriculture technologies, including machine learning, offer opportunities to improve crop productivity and resource utilization through data-driven decision-making.
Many existing crop recommendation systems suffer from limitations such as small datasets, poor scalability, low accuracy, and lack of interpretability. To overcome these issues, the proposed KRISHIMITRA system employs XGBoost, a powerful ensemble learning algorithm capable of handling large datasets efficiently while providing insights into prediction decisions through feature importance analysis.
The study uses a large Kaggle agricultural dataset containing approximately 57,000 records, which was cleaned to 53,127 records. The dataset includes 8 environmental and soil-related features and 55 crop classes, making it a challenging multiclass classification problem. Data preprocessing involved removing missing values, normalizing features, encoding crop categories, and validating data consistency. The dataset was divided into training and testing sets with 37,188 and 10,626 samples respectively.
The KRISHIMITRA system utilizes XGBoost with grid search hyperparameter optimization, early stopping, and 5-fold cross-validation to ensure robust model performance and prevent overfitting. Explainability is integrated through feature importance analysis, allowing users to understand which factors influence crop recommendations.
Experimental results demonstrate exceptional performance. The model achieved:
Training Accuracy: 99.46%
Validation Accuracy: 99.49%
Test Accuracy: 99.22%
Cross-Validation Accuracy: 99.36% ± 0.08%
The very small train-test gap of 0.24% indicates strong generalization and minimal overfitting. Feature importance analysis revealed that crop length, water demand, and relative humidity are the most influential factors in crop prediction, followed by nutrient-related features such as nitrogen, phosphorus, and potassium.
Comparison with other machine learning algorithms showed that XGBoost outperformed traditional approaches such as Naïve Bayes, Decision Trees, Random Forest, and Gradient Boosting, achieving the highest accuracy of approximately 99.34%. The confusion matrix and per-class F1-score analysis further confirmed that the model accurately distinguishes among the 55 crop categories, with only minor confusion between crops having similar environmental requirements.
Conclusion
This paper introduces KRISHIMITRA, an explainable and scalable crop recommender system, based on XGBoost algorithm, to assist in smart decision-making for precision agriculture. Through the use of a large-scale dataset of more than 53,000 clean samples and 55 crop classes, the proposed model has shown excellent predictive performance and scalability. Our findings reveal that XGBoost achieves a test accuracy of 99.22%, and a cross-validation accuracy of 99.36% ± 0.08%, suggesting good generalisation and low overfitting.
The incorporation of explainable artificial intelligence is a major contribution of this study. The analysis of feature importance shows that the duration of the crop, the water needed to grow it and relative humidity are the most important characteristics in determining the suitability of a crop. This not only ensures the model\'s predictions are accurate, but also consistent with expert understanding of crop management, which makes the system more trustworthy and useful.
The comparison study also validates that the proposed system is much better than the conventional machine learning approaches (Decision Trees, Naïve Bayes and Support Vector Machines) and other ensemble methods (Random Forest and Gradient Boosting). XGBoost\'s capability to handle large data sets and interactions among features is well suited for agricultural purposes.
Although the model is highly accurate, it can be further improved by using real-time data sources, such as IoT sensors, satellite images, and weather prediction systems. This can help in making context-aware and real-time recommendations, making it more valuable to farmers.
Overall, the KRISHIMITRA approach offers a highly accurate, transparent and scalable crop recommendation system. It can play a crucial role in enhancing crop productivity, efficient resource management and promote sustainable agriculture in the precision farming era.
References
[1] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD), 2016, pp. 785–794.
[2] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
[3] V. Vapnik, The Nature of Statistical Learning Theory. New York, USA: Springer, 1995.
[4] J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.
[5] S. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 4765–4774.
[6] R. K. Pathan, S. Patel, and M. Shah, “Crop Recommendation System Using Machine Learning Techniques,” International Journal of Engineering Research & Technology, vol. 9, no. 5, pp. 123–128, 2020.
[7] A. Kamilaris and F. X. Prenafeta-Boldú, “Deep Learning in Agriculture: A Survey,” Computers and Electronics in Agriculture, vol. 147, pp. 70–90, 2018.
[8] K. G. Liakos et al., “Machine Learning in Agriculture: A Review,” Sensors, vol. 18, no. 8, pp. 1–29, 2018.
[9] J. Jeong, J. Resop, N. Mueller, D. Fleisher, K. Yun, E. Butler, D. Timlin, K. Shim, and S. Kim, “Random Forests for Global and Regional Crop Yield Predictions,” PLoS ONE, vol. 11, no. 6, pp. 1–15, 2016.
[10] S. Mohanty, D. Hughes, and M. Salathé, “Using Deep Learning for Image-Based Plant Disease Detection,” Frontiers in Plant Science, vol. 7, pp. 1–10, 2016.
[11] A. J. Patel and M. K. Jain, “Crop Recommendation System Using Data Mining Techniques,” International Journal of Computer Applications, vol. 181, no. 27, pp. 1–5, 2018.
[12] N. K. Verma, A. Singh, and S. Kumar, “Predictive Analytics for Crop Yield Using Machine Learning Approaches,” Procedia Computer Science, vol. 167, pp. 1250–1259, 2020.
[13] M. L. R. V. Subbarao and P. S. Rao, “An Intelligent Crop Recommendation System Using Machine Learning Algorithms,” IEEE Access, vol. 9, pp. 123456–123465, 2021.
[14] A. K. Singh, R. Kumar, and S. Dwivedi, “Soil-Based Crop Recommendation Using Machine Learning Techniques,” in Proc. Int. Conf. Smart Computing and Informatics, 2020, pp. 245–252.
[15] P. Sharma, A. Kaur, and R. Singh, “Explainable AI in Agriculture: A Review of Techniques and Applications,” IEEE Access, vol. 10, pp. 45678–45692, 2022.